perm filename A118.TEX[106,PHY] blob sn#839880 filedate 1987-05-06 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	\magnification\magstephalf
C00009 ENDMK
C⊗;
\magnification\magstephalf
\input macro.tex
\def\today{\ifcase\month\or
  January\or February\or March\or April\or May\or June\or
  July\or August\or September\or October\or November\or December\fi
  \space\number\day, \number\year}
\baselineskip 14pt
\rm
\line{\sevenrm a118.tex[106,phy] \today\hfill}
\font\rmn=cmr9

\bigskip
\line{\bf Generating Random Samples Without Replacement, as Sequences and Sets\hfil}

\bigskip
To generate a random permutation of $N$ out of~$M$:

{\obeylines\obeyspaces\let =\ \tt
        $S←\langle\;\rangle$; $\{$empty sequence$\}$
        FOR J:=M-N+1 TO M DO
            BEGIN
            R:=RANDOM(1,J);
            IF R occurs in S
            THEN insert J in S after R
            ELSE prefix R to S
            END;
        PRINT(S).
}

\noindent
Every permutation is generated, by a not-too-hard inductive proof. The
inductive hypothesis is that after $t$~times through the iteration,
$J=M-N+t$, and $S$ can be any permutation of~$t$ distinct integers
between~1 and~$J$.
The number of distinct permutations is ${M\choose N}$, the same as the
number of sequences of values of~$R$, so every permutation is generated
in exactly one way, so the distribution is uniform.

{\narrower\smallskip\noindent
Inductive Step of Proof:
\smallskip}

{\narrower\smallskip\noindent
Let $S↓t$ mean the value of $S$ after $t$ iterations. Assume that $S↓t$
can be any permutation of~$t$ distinct numbers out of~$J$. Define the
{\it tail\/} of a sequence as all but the first element. To get an arbitrary
permutation~$P$ of $t+1$ out of $J+1$ as $S↓{t+1}$, there are two cases:
\smallskip}

{\narrower\smallskip\noindent
$\bullet$\quad If $J+1$ does not occur in the tail of~$P$, let $S↓t$
be the tail of~$P$ and do one more iteration, where $R↓{t+1}=$ the initial
of~$P$.
\smallskip}

{\narrower\smallskip\noindent
$\bullet$\quad If $J+1$ occurs in the tail of $P$, let $S↓t$ be $P$ with
$J+1$ omitted and do one more iteration, where $R↓{t+1}=$ the value that
precedes $J+1$ in~$P$.
\smallskip}

\noindent
An appropriate data structure for $S$ is a hash table with a linked
list connecting the entries.
If the hash table size is about~$2N$, the expected running
time is $O(N)$.
A cautious version of this representation is a balanced ordered
tree with a linked list running through it, 
for expected and worst case times $O(N\log N)$.

The algorithm is readily adapted to generating random combinations of~$N$
out of~$M$, by letting $S$ be a set:

{\obeylines\obeyspaces\let =\ \tt
        S{$←\emptyset$};
        FOR J:=M-N+1 TO M DO
            BEGIN
            R:=RANDOM(1,J);
            IF R$\in$S
            THEN insert J in S
            ELSE insert R in S
            END;
        Sort S if desired;
        PRINT(S).
}

\noindent
Here an appropriate data structure for $S$ is a bit array, if $M$ is
no more than perhaps $100N$; if $W$ is the number of bits per word, the
run time is virtually constant at $O(N)+O(M/W)$.

For larger $M$, an array of size near $N$ indexed by the high order bits of
the data, of pointers to sorted linked lists
containing (the low order bits of) the data, works
well. Mean execution time is $O(N)$, variance is $O(N)$, and maximum is
$O(N↑2)$.

A cautious implementation uses a balanced ordered tree representation of~$S$.
Mean and worst case execution time are $O(N\log N)$, with small variance.

The random permutation can be treated as a nondeterministic algorithm and
macroexpanded into a backtracking algorithm that generates all permutations
of~$N$ out of~$M$. (See R.~Floyd, ``Nondeterministic Algorithms,''
{\sl JACM\/ \bf 14}, 4, Oct.\ 1967, pp.\ 636--644 for details.)

\bigskip
\parindent0pt
\copyright 1987 Robert W. Floyd

First draft April 13, 1987. 
revised: May 6, 1987.
%subsequently revised.

\bye